-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] raft v23.12 #2020
Merged
Merged
[RELEASE] raft v23.12 #2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
This PR builds conda packages using CUDA 12 on ARM. Closes #1834. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #1853
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Opening a fresh PR in hopes to reset gh-actions runners to v3. Authors: - Corey J. Nolet (https://github.com/cjnolet) - Paul Taylor (https://github.com/trxcllnt) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Dante Gama Dessavre (https://github.com/dantegd) - Ray Douglass (https://github.com/raydouglass) - Ben Frederickson (https://github.com/benfred) URL: #1868
Branch 23.12 merge 23.10
Building on cuda 12.2 shows errors like ``` /code/raft/cpp/include/raft/spatial/knn/detail/ball_cover/registers-inl.cuh(177): error #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function KeyValuePair<value_t, value_idx> shared_memV[kNumWarps * warp_q]; ``` Fix by using default constructors for structures in shared memory, even trivial constructors will cause this issue Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1870
I got errors when compiling a program using raft NN-descent. This PR fixes the bug. ## Error e.g. ``` /home/.../include/raft/neighbors/detail/nn_descent.cuh(1158): error: invalid narrowing conversion from "unsigned long" to "int" h_rev_graph_old_{static_cast<size_t>(nrow_ * NUM_SAMPLES)}, ``` - nvcc ``` Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 ``` - gcc ``` gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 ``` - thrust : 2.2 (This is the cause of this error [[detail](#1869 (comment))]) # Change Use `Class(...)` instead of `Class{...}`. # Cause The NN-descent code calls constructors of `thrust::host_vector` as shown below: ```cpp graph_host_buffer_{static_cast<size_t>(nrow_ * DEGREE_ON_DEVICE)}, ``` However, this constructor is regarded as a list initialization. This is the same as the following code outputting 1 instead of 2. ```cpp #include <iostream> #include <vector> int main() { std::vector<float> list{2}; std::cout << list.size() << std::endl; } ``` [detail](https://en.cppreference.com/w/cpp/language/list_initialization) Authors: - tsuki (https://github.com/enp1s0) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) URL: #1869
NN-Descent was using `int` type for indexing in `mdarray`, however this was causing an overflow when the product of all extents was greater than `int`. This PR also adds/fixes: - Missing dependencies for `raft-ann-bench` development environment - Exposes NN Descent iterations to use in CAGRA benchmarks Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Ray Douglass (https://github.com/raydouglass) URL: #1875
Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1867
# Purpose This PR provides a utility for copying between generic mdspans. This includes between host and device, between mdspans of different layouts, and between mdspans of different (convertible) data types ## API `raft::copy(raft_resources, dest_mdspan, src_mdspan);` # Limitations - Currently does not support copies between mdspans on two different GPUs - Currently not performant for generic host-to-host copies (would be much easier to optimize with submdspan for padded layouts) - Submdspan with padded layouts would also make it easier to improve perf of some device-to-device copies, though perf should already be quite good for most device-to-device copies. # Design - Includes optional `RAFT_DISABLE_CUDA` build definition in order to use this utility in CUDA-free builds (important for use in the FIL backend for Triton) - Includes a new `raft::stream_view` object which is a thin wrapper around `rmm::stream_view`. Its purpose is solely to provide a symbol that will be defined in CUDA-free builds and which will throw exceptions or log error messages if someone tries to use a CUDA stream in a CUDA-free build. This avoids a whole bunch of ifdefs that would otherwise infect the whole codebase. - Uses (roughly in order of preference): `cudaMemcpyAsync, std::copy, cublas, custom device kernel, custom host-to-host transfer logic` for the underlying copy - Provides two different headers: `raft/core/copy.hpp` and `raft/core/copy.cuh`. This is to accommodate the custom kernel necessary for handling completely generic device-to-device copies. See below for more details. ## Details on the header split For many instantiations, even those which involve the device, we do not require nvcc compilation. If, however, we determine at compilation time that we must use a custom kernel for the copy, then we must invoke nvcc. We do not wish to indicate that a public header file is a C++ header when it is a CUDA header or vice versa, so we split the definitions into separate `hpp` and `cuh` files, with all template instantiations requiring the custom kernel enable-if'd out of the hpp file. Thus, the cuh header can be used for _any_ mdspan-to-mdspan copy, but the hpp file will not compile for those specific instantiations that require a custom kernel. The recommended workflow is that if a `cpp` file requires an mdspan-to-mdspan copy, first try the `hpp` header. If that fails, the `cpp` file must be converted to a `cu` file, and the `cuh` header should be used. For source files that are already being compiled with nvcc (i.e. `.cu` files), the `cuh` header might as well be used and will not result in any additional compile time penalty. # Remaining tasks to leave WIP status - [x] Add benchmarks for copies - [x] Ensure that new function is correctly added to docs # Follow-up items - Optimize host-to-host transfers using a cache-oblivious approach with SIMD-accelerated transposes for contiguous memory - Test cache-oblivious device-to-device transfers and compare performance - Provide transparent support for copies between devices. ## Relationship to mdbuffer This utility encapsulates a substantial chunk of the core logic required for the mdbuffer implementation. It is being split into its own PR both because it is useful on its own and because the mdbuffer work has been delayed by higher priority tasks. Close #1779 Authors: - William Hicks (https://github.com/wphicks) - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #1818
This PR adds a primitive to compute recall score for Nearest Neighbor Algorithms Authors: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) - tsuki (https://github.com/enp1s0) - Micka (https://github.com/lowener) - William Hicks (https://github.com/wphicks) Approvers: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) URL: #1860
Adding instructions to how to use the ANN benchmark containers, as well as general benchmark doc improvements. Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1833
Update to use non deprecated signatures for `rapids_export` functions Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1884
Closes #1813 This PR adds FAISS CPU indexes to the raft-Ann-bench benchmarks, and adjusts the build accordingly. In addition, this PR also updated the FAISS version to include the RAFT-enabled version, which required removing the FAISS conda packages and building FAISS from source. Authors: - Corey J. Nolet (https://github.com/cjnolet) - Robert Maynard (https://github.com/robertmaynard) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) URL: #1814
Forward-merge branch-23.10 to branch-23.12
This PR switches back to using `branch-23.12` for CI workflows because the CUDA 12 ARM conda migration is complete. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1886
- Use strides to determine whether we can use a contiguous layout. Useful for accepting inputs with stride information like numpy array or legate stores. The code is mostly copied and pasted from mdspan layout_left/right constructors. These constructors can optionally accept a stride layout if the underlying memory is contiguous, otherwise throw an exception. We change the exception into a boolean return value. Authors: - Jiaming Yuan (https://github.com/trivialfis) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #1861
The python raft-ann-bench code launches subprocesses to execute C++ code. These scripts weren't checking the return values of the C++ programs though, and just waiting for them to finish. Fix this to propogate the failures up as exceptions by checking the subprocess return value. After this change, having a failing subprocess looks something like: ``` 2023-10-12 10:54:55 [info] Using the dataset file '/home/ben/code/raft/python/raft-ann-bench/datasets/glove-100-inner/base.fbin' terminate called after throwing an instance of 'std::runtime_error' what(): read header of BinFile failed: /home/ben/code/raft/python/raft-ann-bench/datasets/glove-100-inner/base.fbin Traceback (most recent call last): File "/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/__main__.py", line 324, in <module> main() File "/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/__main__.py", line 309, in main run_build_and_search( File "/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/__main__.py", line 113, in run_build_and_search subprocess.run(cmd, check=True) File "/home/ben/miniconda3/envs/raft/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/home/ben/miniconda3/envs/raft/bin/ann/RAFT_CAGRA_ANN_BENCH', '--build', '--data_prefix=/home/ben/code/raft/python/raft-ann-bench/datasets/', '--benchmark_out_format=json', '--benchmark_out=/home/ben/code/raft/python/raft-ann-bench/datasets/glove-100-inner/result/build/raft_cagra-10-10000.json', '/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/conf/temporary_glove-100-inner.json']' died with <Signals.SIGABRT: 6>. ``` Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Divye Gala (https://github.com/divyegala) URL: #1897
In ANN benchmark, when throughput mode is enabled, it is important that each thread works with a separate set of queries, otherwise cache effects can distort the results. This PR sets a different starting offset for each thread. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1981
Adds a CI check to verify the the libraft.so doesn't have any public weak kernel symbols. Authors: - Robert Maynard (https://github.com/robertmaynard) - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #1935
- fix matrix::detail::select::radix::calc_chunk_size() for one-block kernel - use `calc_buf_len()` rather than `len` as the buffer size of one-block kernel - reduce register footprint of one-block kernel by reducing the number of buffer pointers - reduce the buffer size by 1/8 for all radix select functions Resolve #1823 Authors: - Yong Wang (https://github.com/yong-wang) - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) - Ben Frederickson (https://github.com/benfred) URL: #1878
Authors: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1956
…ns (#1968) Currently dask versions are pinned as part of every release cycle and then unpinned for the next development cycle across all of RAPIDS. This introduces a great deal of churn. To centralize the dependency, we have created a metapackage to manage the required dask version and this PR introduces that metapackage as a dependency of raft_dask. xref: rapidsai/cudf#14364 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Divye Gala (https://github.com/divyegala) - Jake Awe (https://github.com/AyodeAwe) URL: #1968
This PR fixes a potential out of bounds access. While building the reverse graph, we count the number of potential edges in `rev_graph_count` https://github.com/rapidsai/raft/blob/047bfb2a7a24a97a4bece0b553521533868e7889/cpp/include/raft/neighbors/detail/cagra/graph_core.cuh#L204-L205 While we store only max `output_graph_degree` neighbors, the `rev_graph_count` can be larger than `output_graph_degree`. The loop that uses `rev_graph` is updated to take this limit into account. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - tsuki (https://github.com/enp1s0) - Corey J. Nolet (https://github.com/cjnolet) URL: #1987
… time (#1989) We can safely assume all cuda streams have been created before we run benchmark experiments. With the sensitivity to latencies for smaller batch sizes, this has an impact on overall throughput. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1989
A few updates to the `update-version.sh` script Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - Jake Awe (https://github.com/AyodeAwe)
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #1990
With #1878 merged, the performance of the radix select algorithms is much improved and we no longer need to incorporate the faiss block select algorithm. With #1878 merged, faiss block select goes from being the 3rd ranked selection algorithm, to the 5th. This regenerates the heuristic function with the latest benchmark times, and removes the faiss block select in favour of kWarpImmediate and kRadix11bitsExtraPass. Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1985
This PR removes static checks for serialization size. Upstream changes like rapidsai/rmm#1370 have altered these sizes and break RAFT CI. An alternative approach to verifying serialization will be developed. Authors: - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) Approvers: - Divye Gala (https://github.com/divyegala) - Mark Harris (https://github.com/harrism) URL: #1997
CAGRA index can be constructed using existing device_mdarrays, in which case just reference to these arrays are stored. This way allocations are managed outside the index, and we can customize how these allocations are made. This PR - modifies the CAGRA ANN bench wrapper to manage the allocations locally, - add options for the json file to specify whether the graph / dataset is allocated in device / host_pinned / host_huge_page memory. Authors: - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) - Robert Maynard (https://github.com/robertmaynard) Approvers: - Micka (https://github.com/lowener) - Divye Gala (https://github.com/divyegala) - Tamas Bela Feher (https://github.com/tfeher) URL: #1896
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Robert Maynard (https://github.com/robertmaynard) URL: #2006
This adds support for iterating over batches of nearest neighbors in the brute force knn. This lets you query for the nearest neighbors, and then filter down the results - and if you have filtered out too many results, get the next batch of nearest neighbors. The challenge here is balancing memory consumption versus efficieny: we could store the results of the full gemm in the distance calculation - but for large indices this discards the benefits of using the tiling strategy and risks running OOM. Instead we exponentially grow the number of neighbors being returned, and also cache both the query norms and index norms between calls. Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1947
- [x] Codepacking for compressed on-device (flat) PQ codes - [x] Testing Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1826
pylibraft specifies its numpy pinning incorrectly in conda packages. It is only a runtime dependency, not a build dependency. Because of run-exports from numpy, the resulting pinning on pylibraft requires a very new numpy version, and is thus causing cuml to fall back to old raft builds. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) - Ray Douglass (https://github.com/raydouglass) URL: #2013
We were relying on spdlog/fmt to be brought in transitively before, which was compatible with FAISS. It looks like they are no longer being brought in transitively and so we are bringing in a newer version at the cpm/cmake level which is breaking our raft-ann-bench-cpu build. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) - Jake Awe (https://github.com/AyodeAwe) URL: #2018
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
RAPIDS repos are using the `main` branch of https://github.com/actions/labeler which recently introduced [breaking changes](https://github.com/actions/labeler/releases/tag/v5.0.0). This PR pins to the latest v4 release of the labeler action until we can evaluate the changes required for v5. Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-23.12
and v23.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-23.12
until release (merging of this PR).What is the purpose of this PR?
branch-23.12
intomain
for the release